Bug fix: fix the Round-Robin algorithm when "ReadMode == ReplicaReadMixed". #663

LykxSassinator · 2023-01-11T04:17:10Z

Signed-off-by: Lucasliang nkcs_lykx@hotmail.com

Bug report

When I was doing relevant optimization on ReadMode, I found there existed a bug on Round-Robin strategy when ReadMode == ReplicaReadMixed.

Supposing that we have a TiKV cluster with 3 nodes - [node 1, node 2, node 3].
Firstly, the cluster keeps in normal state, the Read flows sent to each nodes are in Uniform Distribution.
And if we made one node abnormal, taking Node 1 as the choice, all flows sent to Node 1 would be redirected to Node 3.

With current Round-Robin strategy

//       [Node 1]                                         [Node 1]
//      /        \                                       /        \
//     /          \          [Node 2] abrnomal ===>     /          \  
//    /            \                                   /            \
// [Node 2] --- [Node 3]                            [Node 2] ---> [Node 3]
//                                                     [x]

Because the following steps:

The choice of state.lastIdx is randomly generated ==> Could be any one of [node 1, node 2, node 3].

client-go/internal/locate/region_request.go

Lines 526 to 528 in f313ddf

    
           if state.lastIdx < 0 { 
        
           	if state.tryLeader { 
        
           		state.lastIdx = AccessIndex(rand.Intn(len(selector.replicas)))

if the previous step set state.lastIdx == [Node 2], it will be filtered by isCandidate.

client-go/internal/locate/region_request.go

Lines 550 to 552 in f313ddf

    
           for i := 0; i < len(selector.replicas) && !state.option.leaderOnly; i++ { 
        
           	idx := AccessIndex((int(state.lastIdx) + i) % len(selector.replicas)) 
        
           	if state.isCandidate(idx, selector.replicas[idx]) {

As the checking always starts from i == 0 and state.lastIdx + i, so the first choice, that is [Node 2], will be filtered. And then loops with i == 1, getting the next choice [Node 3], it's a normal node, then exit.
Finally, all flows sent to the abnormal [Node 2] will be redirect to [Node 3], making the flows do not meet Uniform Distribution as expected.

And we made a test, the following CPU metrics of TiKV cluster proved it:

What we expect

What we expect is that, if Node 2 is abnormal, all flows originally sent to this node should be uniformly redirected to other nodes.

Solution

I made a minor optimization on the original Round-Robin strategy as the following shows. After we try to filter out abnormal node choice, we should randomly choose one which meets the requirements.

for cnt := 0; cnt < replicaSize && !state.isCandidate(idx, selector.replicas[idx]); cnt++ {
	idx = AccessIndex((int(idx) + rand.Intn(replicaSize)) % replicaSize)
}

And after this supplementary strategy is introduced, we get the expected results:

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>

LykxSassinator · 2023-01-11T04:19:39Z

@youjiali1995 PTAL, thx

sticnarf

LGTM

LykxSassinator added 2 commits January 11, 2023 11:07

Bugfix on node choice when "replica_read_mode = leader-and-follower".

1029759

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>

Revoke unnecessary annotations' changes.

f1566cc

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>

youjiali1995 requested review from youjiali1995 and sticnarf January 11, 2023 05:11

youjiali1995 approved these changes Jan 11, 2023

View reviewed changes

sticnarf approved these changes Jan 11, 2023

View reviewed changes

sticnarf merged commit c598334 into tikv:master Jan 11, 2023

LykxSassinator deleted the fix_round_robin branch January 11, 2023 06:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug fix: fix the Round-Robin algorithm when "ReadMode == ReplicaReadMixed". #663

Bug fix: fix the Round-Robin algorithm when "ReadMode == ReplicaReadMixed". #663

LykxSassinator commented Jan 11, 2023

LykxSassinator commented Jan 11, 2023

sticnarf left a comment

	if state.lastIdx < 0 {
	if state.tryLeader {
	state.lastIdx = AccessIndex(rand.Intn(len(selector.replicas)))

	for i := 0; i < len(selector.replicas) && !state.option.leaderOnly; i++ {
	idx := AccessIndex((int(state.lastIdx) + i) % len(selector.replicas))
	if state.isCandidate(idx, selector.replicas[idx]) {

Bug fix: fix the Round-Robin algorithm when "ReadMode == ReplicaReadMixed". #663

Bug fix: fix the Round-Robin algorithm when "ReadMode == ReplicaReadMixed". #663

Conversation

LykxSassinator commented Jan 11, 2023

Bug report

With current Round-Robin strategy

What we expect

Solution

LykxSassinator commented Jan 11, 2023

sticnarf left a comment

Choose a reason for hiding this comment